{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Quantitative Finance\n",
    "\n",
    "Copyright (c) 2019 Python Charmers Pty Ltd, Australia, <https://pythoncharmers.com>. All rights reserved.\n",
    "\n",
    "<img src=\"img/python_charmers_logo.png\" width=\"300\" alt=\"Python Charmers Logo\">\n",
    "\n",
    "Published under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. See `LICENSE.md` for details.\n",
    "\n",
    "Sponsored by Tibra Global Services, <https://tibra.com>\n",
    "\n",
    "<img src=\"img/tibra_logo.png\" width=\"300\" alt=\"Tibra Logo\">\n",
    "\n",
    "\n",
    "## Module 1.5: Bayesian inference\n",
    "\n",
    "# 1.5.1 Bayesian Statistics Introduction\n",
    "\n",
    "Classical / frequentist statistics interprets a probability as a limit of many identical trials. In the strict frequentist viewpoint, a statement such as \"The probability that party X will win the election\" is meaningless; the election will only take place once, under one set of circumstances, and will not be repeated -- so the outcome cannot be observed repeatedly and the probability of this cannot be computed. Likewise, the strict frequentist perspective is that it is meaningless to discuss probabilities associated with model parameters and other fixed but unknown quantities, such as whether it is raining right now (if you are holed up in a building). It is either raining or it is not.\n",
    "\n",
    "In practice, it is useful to be able to reason quantitatively about events that cannot and will not be repeated, or about model parameters that are fixed but unobserved, or about quantities that are fixed (deterministic) but unknown. This requires either a frequentist methodology combined with inaccurate statistical reasoning (which is rife) or a Bayesian interpretation of probability.\n",
    "\n",
    "In the Bayesian perspective, a probability encodes a degree of certainty about a proposition. It is \"subjective\": you and I may have different beliefs about a proposition. With the proposition \"it is raining\", if you heard yesterday's weather forecast, and I didn't, we may assign different probabilities the same proposition that it is raining right now.\n",
    "\n",
    "\n",
    "# Introduction: We all think \"Bayesian\"\n",
    "\n",
    "The Bayesian method requires that we be explicit about our uncertainty up-front and update our beliefs in the light of new evidence or data.\n",
    "\n",
    "Features of Bayesian inference:\n",
    "- Probability encodes our degree of certainty or uncertainty.\n",
    "- Unknown parameters are modelled as probability distributions.\n",
    "\n",
    "\n",
    "### Why use Bayesian inference?\n",
    "\n",
    "1. Because it is correct. Bayes' theorem is a theorem. If you accept the axioms of probability theory, it is true. You can ignore it, but it won't go away.\n",
    "\n",
    "2. Because it is useful for applications.\n",
    "3. You can incorporate your existing knowledge into the model, known as a prior.\n",
    "\n",
    "This last point is interesting. If you have some *a priori* knowledge of the event (either intuition or something more formal), you can set this as a prior in your model. This leads to faster training of your model (assuming your prior is somewhat accurate) and can lead to the discover of better models.\n",
    " \n",
    "\n",
    "Applications of Bayesian inference include:\n",
    "\n",
    "- prediction (e.g. outcome of an election, behaviour of asset prices, ...)\n",
    "- classification (does a person with these symptoms have this disease?; OCR; speech recognition)\n",
    "- scientific inference\n",
    "- hypothesis testing (single and multiple)\n",
    "- modelling\n",
    "- uncertainty analysis\n",
    "- physics: thermodynamics, crystallography, astronomy, ...\n",
    "- biology and medicine: bioinformatics (sequencing), predicting drug targets\n",
    "- ranking for information retrieval, spam detection, ... and other computer science applications\n",
    "\n",
    "\n",
    "### How to do Bayesian inference?\n",
    "\n",
    "1. Method 1: **mathematical analysis**. This is exact but requires heroic integration skills, which makes it hard to begin in the field. A more important drawback is that it tends to be intractable for complex models.\n",
    "\n",
    "2. Method 2: **probabilistic programming**. Computing power is cheap! Using simulation allows models to be more complex."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Probability as logic\n",
    "\n",
    "Bayesian statistics was previously called the \"method of inverse probability\" -- reasoning backwards from what we observe about the world to make quantitative inferences. Here is how [Jaynes 2003] introduces its relationship to other forms of inference.\n",
    "\n",
    "### Deductive reasoning\n",
    "\n",
    "Deductive reason can be analyzed into repeated application of two syllogisms:\n",
    "\n",
    "<hr>\n",
    "\n",
    "<center>\n",
    "If A is true, then B is true.\n",
    "\n",
    "A is true.\n",
    "\n",
    "Therefore, B is true.\n",
    "</center>\n",
    "\n",
    "and its inverse:\n",
    "\n",
    "<center>\n",
    "If B is false, then A is false.\n",
    "\n",
    "B is false.\n",
    "\n",
    "Therefore, A is false.\n",
    "</center>\n",
    "\n",
    "<hr>\n",
    "\n",
    "Note that these are only valid for the true/false values above. For instance, this reasoning in **invalid**: \"If A is true, then B is true -> A is false, therefore B is false\".\n",
    "\n",
    "\n",
    "### Intuitive reasoning\n",
    "\n",
    "As [Jaynes 2003] notes, we would like to use this kind of reasoning all the time, but most of the time **we do not have the right kind of information**. We fall back on weaker syllogisms like:\n",
    "\n",
    "<hr>\n",
    "\n",
    "If A is true, then B is true.\n",
    "\n",
    "B is true.\n",
    "\n",
    "Therefore, A becomes more plausible.\n",
    "\n",
    "<hr>\n",
    "\n",
    "\n",
    "\n",
    "**Example: probability of rain**\n",
    "\n",
    "A $\\equiv$ it will start to rain by 10am at the latest.\n",
    "\n",
    "B $\\equiv$ the sky will become cloudy before 10am.\n",
    "\n",
    "If you notice clouds at 9:45am, your common sense obeys the weak syllogism above. You are more likely to take an umbrella.\n",
    "\n",
    "What is the logical connection?\n",
    "\n",
    "rain $\\implies$ clouds\n",
    "\n",
    "Notice that the physical cause is in the opposite direction:\n",
    "\n",
    "clouds $\\implies$ rain\n",
    "\n",
    "Important: the **logical connection** is what is most relevant to our inference.\n",
    "\n",
    "### Weak syllogism 2\n",
    "\n",
    "There are other weaker syllogisms that we use every day in our reasoning, such as:\n",
    "\n",
    "<hr>\n",
    "\n",
    "If A is true, then B becomes more plausible.\n",
    "\n",
    "A is false.\n",
    "\n",
    "Therefore, B becomes less plausible.\n",
    "\n",
    "<hr>\n",
    "\n",
    "**Example: disease symptoms**\n",
    "\n",
    "Symptoms of the Zika virus include conjunctivitis (red eyes) and skin rash.\n",
    "\n",
    "This traveller doesn't have conjunctivitis or skin rash. Therefore, it is less likely that he/she has the Zika virus.\n",
    "\n",
    "### Weak syllogism 3\n",
    "\n",
    "<hr>\n",
    "\n",
    "If A is true, then B becomes more plausible.\n",
    "\n",
    "B is true.\n",
    "\n",
    "Therefore, A becomes more plausible.\n",
    "\n",
    "<hr>\n",
    "\n",
    "**Example:**\n",
    "\n",
    "Aston-Martin owners are usually rich.\n",
    "\n",
    "Bill Gates is rich.\n",
    "\n",
    "Therefore, it is more likely that Bill Gates owns an Aston-Martin.\n",
    "\n",
    "### Aside\n",
    "\n",
    "There are many more complex weak syllogisms that we use easily and intuitively in everyday reasoning. Polya [1945, 1954] wrote three books about plausible reasoning, pointing out many interesting examples and showing that we do plausible reasoning by applying definite rules.\n",
    "\n",
    "### Meaning for probability theory\n",
    "\n",
    "These principles may be made **quantitative**, with useful applications.\n",
    "\n",
    "How? With the Cox-Jaynes interpretation of probability theory and **Bayes theorem**.\n",
    "\n",
    "[See Terenin and Draper, \"Cox's Theorem and the Jaynesian Interpretation of Probability\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bayes' theorem\n",
    "\n",
    "- $X$ = prior information\n",
    "- $H$ = hypothesis or model parameters\n",
    "- $D$ = data\n",
    "\n",
    "$$P(H | DX) = P(H | X) \\frac{P(D | HX)}{P(D | X)}$$\n",
    "\n",
    "The vertical bar means \"conditional upon\". Bayes' theorem follows simply from the definition of conditional probability: $P ( A | B ) = \\frac{P (A \\cap B)}{P(B)}$.\n",
    "\n",
    "We call these things:\n",
    "\n",
    "$$ \\textrm{posterior} = \\textrm{prior} \\times \\frac{\\textrm{likelihood}}{\\textrm{evidence}}$$\n",
    "\n",
    "### Prior information\n",
    "\n",
    "It is important to specify the **prior information** carefully before we have a well-posed problem. Then this equation tells us what probabilities we need to find in order to see what conclusions are justified by our evidence.\n",
    "\n",
    "Often this is regarded as a nuisance. It is, however, an **opportunity** to improve models, especially when data is sparse.\n",
    "\n",
    "### Data sparsity\n",
    "\n",
    "Often, in the real world, data is small / sparse / noisy / expensive to collect.\n",
    "\n",
    "See \"N is never large\", Andrew Gelman: http://andrewgelman.com/2005/07/31/n_is_never_larg/\n",
    "\n",
    "As you have more data, the posterior drifts away from the prior, **provided the prior is uninformed / flat enough**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "1"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from IPython.core.pylabtools import figsize\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "figsize(11, 9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bayesian updating\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "3"
    }
   },
   "outputs": [],
   "source": [
    "%run -i bayesian_updating_plot.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercises\n",
    "\n",
    "Review the code for `bayesian_updating_plot.py` and make the following changes:\n",
    "\n",
    "1. Run the analysis for 5000 trails for the last plot\n",
    "2. Rerun the analysis with a biased coin (i.e. one with an 80% chance of heads). Observe how the Bayesian analysis probabilities change in the new plots."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*For solutions, see `solutions/bayesian_updating_2.py`*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Outline of solving problems with Bayesian reasoning\n",
    "\n",
    "Tim Salimans, the winner of the Kaggle contest \"Observing Dark Words\", used Bayesian inference to find the best locations for halos. His solution is presented well in \"Bayesian Methods for Hackers\", chapter 5.\n",
    "\n",
    "Summary:\n",
    "\n",
    "### Steps to doing Bayesian inference\n",
    "\n",
    "1. Construct a prior distribution $p(H|X)$ for what you are predicting ($H$) -- i.e. formulate our expectations about the thing before looking at the data.\n",
    "\n",
    "2. Construct a probabilistic model for the data given your hypothesis / parameters and prior information: $p(D | HX)$.\n",
    "\n",
    "3. Use Bayes’ rule to calculate the posterior distribution $P(H|DX)$ of the hypothesis. In other words, use the data to determine a **probability distribution** over the unknowns.\n",
    "\n",
    "4. Minimize the expected loss with respect to the posterior distribution over the predictions for\n",
    "$h$: \n",
    "\n",
    "$$\n",
    "\\hat{h} = \\mathrm{argmin}_\\mathrm{prediction} \\mathbb{E} [L(\\mathrm{prediction}, h)]\n",
    "$$\n",
    "\n",
    "i.e. tune our predictions to be as good as possible for the given error metric.\n",
    "\n",
    "This may be the maximum a-posteriori (\"MAP\") estimate or something else.\n",
    "\n",
    "### Applications which benefit from strong prior information\n",
    "\n",
    "- Learning when a self-driving car should brake from LIDAR data\n",
    "- Learning the effectiveness of a medicine from clinical data\n",
    "- Learning the elasticity of demand from economic data\n",
    "- Learning the structure of a distant galaxy from telescopic data\n",
    "- Learning whether to accept or reject a box of widgets from industrial QC control data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Resources\n",
    "\n",
    "### Books\n",
    "\n",
    "- Edwin T. Jaynes: \"Probability Theory: The Logic of Science\". (Cambridge University Press). A masterpiece. Deep and very interesting. \n",
    "\n",
    "- Michael I Jordan's lecture notes (UC Berkeley): https://people.eecs.berkeley.edu/~jordan/courses.html\n",
    "\n",
    "- Bayesian modeling and inference, Spring 2010\n",
    "- Practical machine learning, Fall 2009\n",
    "\n",
    "- D. S. Sivia with J Skilling: \"Data analysis: A Bayesian tutorial\"\n",
    "\n",
    "- MacKay: Information Theory, Inference, and Learning Algorithms\n",
    "\n",
    "- John Kruschke: Doing Bayesian Data Analysis. 2nd edition switches to JAGS and Stan (from BUGS). \"The dog book\" for the illustration of dogs on the cover.\n",
    "\n",
    "- Andrew Gelman et al. Bayesian Data Analysis. CRC Press (3rd edition). The most influential and widely used Bayesian text by statisticians.\n",
    "\n",
    "- \"Probabilistic Programming and Bayesian Methods for Hackers\": Pearson CMG. Useful because it serves as more accessible documentation to PyMC. (The official PyMC docs assume prior knowledge of Bayesian inference.)\n",
    "\n",
    "PyMC3 examples for PPBMH are available on GitHub: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers\n",
    "\n",
    "### Software for Bayesian modelling\n",
    "\n",
    "- Stan: http://mc-stan.org. Upcoming successor to BUGS / JAGS. Compiles models to C++. Uses Hamiltonian Monte Carlo for posterior sampling.\n",
    "\n",
    "- PyMC3: http://pymc-devs.github.io/pymc3/\n",
    "\n",
    "- Edward: http://edwardlib.org. Variational inference package for Python built on TensorFlow."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
